Code Similarity Comparison of Multiple Source Trees

نویسنده

  • Warren Toomey
چکیده

This paper outlines the design of a code comparison tool, ctcompare, which use short sequences of lexical tokens from source code as a key in an inverted index to perform the code comparison. This technique allows the comparison of multiple source code trees simultaneously. Other significant features of the tool include the definition of a serialised token stream format which allows the independent analysis of a source tree without revealing the full source code, and isomorphic code comparison to identify renamed identifiers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Code Similarity Detection in Multiple Large Source Trees using Token Hashes

The ability to find similarities between two source code bases, or within one code base, has many uses including the detection of student plagiarism, the identification of intellectual property violations and the location of repeated code in a code base amenable to refactoring. Previous structure-metric approaches have used either suffix trees or modified Longest Common Subsequence algorithms t...

متن کامل

TOPD/FMTS: a new software to compare phylogenetic trees

SUMMARY TOPD/FMTS has been developed to evaluate similarities and differences between phylogenetic trees. The software implements several new algorithms (including the Disagree method that returns the taxa, that disagree between two trees and the Nodal method that compares two trees using nodal information) and several previously described methods (such as the Partition method, Triplets or Quar...

متن کامل

Deep Learning Similarities from Different Representations of Source Code

Assessing the similarity between code components plays a pivotal role in a number of Software Engineering (SE) tasks, such as clone detection, impact analysis, refactoring, etc. Code similarity is generally measured by relying on manually defined or hand-crafted features, e.g., by analyzing the overlap among identifiers or comparing the Abstract Syntax Trees of two code components. These featur...

متن کامل

A Comparison of Similarity Techniques for Detecting Source Code Plagiarism

Academic dishonesty is a universal problem. Detecting duplicated text among natural language artifacts is a welldocumented task. However, performing similar analysis on source code presents unique problems. In this paper, I present a comparison of the application of various techniques in textual similarity processing on source code. Beyond this, I investigate the application of textual similari...

متن کامل

Pinda: A Web service for detection and analysis of intraspecies gene duplication events

We present Pinda, a Web service for the detection and analysis of possible duplications of a given protein or DNA sequence within a source species. Pinda fully automates the whole gene duplication detection procedure, from performing the initial similarity searches, to generating the multiple sequence alignments and the corresponding phylogenetic trees, to bootstrapping the trees and producing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008